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Abstract — We address the problem of finding optimal strategies 
for computing Boolean symmetric functions. We consider a col- 
located network, where each node's transmissions can be heard 
by every other node. Each node has a Boolean measurement 
and we wish to compute a given Boolean function of these 
measurements with zero error. We allow for block computation 
to enhance data fusion efficiency, and determine the minimum 
worst-case total bits to be communicated to perform the desired 
computation. We restrict attention to the class of symmetric 
Boolean functions, which only depend on the number of Is among 
the n measurements. 

We define three classes of functions, namely threshold func- 
tions, delta functions and interval functions. We provide exactly 
optimal strategies for the first two classes, and an order-optimal 
strategy with optimal preconstant for interval functions. Using 
these results, we can characterize the complexity of computing 
percentile type functions, which is of great interest. In our 
analysis, we use lower bounds from communication complexity 
theory, and provide an achievable scheme using information 
theoretic tools. 

I. INTRODUCTION 

Wireless sensor networks are composed of nodes with 
sensing, wireless communication and computation capabili- 
ties. These networks are designed for applications like fault 
monitoring, data harvesting and environmental monitoring. In 
these applications, one is interested only in computing some 
relevant function of the measurements. For example, one might 
want to compute the mean temperature for environmental 
monitoring, or the maximum temperature in fire alarm sys- 
tems. This suggests moving away from a data forwarding 
paradigm, and focusing on efficient in-network computation 
and communication strategies for the function of interest. 

The problem of computing functions of distributed data in 
sensor networks presents several challenges. On the one hand, 
the wireless medium being a broadcast medium, nodes have to 
deal with interference from other transmissions. On the other 
hand, nodes can exploit these overheard transmissions, and 
the structure of the function to be computed, to achieve a 
more efficient description of their own data. This is a sigificant 
departure from the traditional decode and forward paradigm. 
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We consider a collocated network where each node's trans- 
missions can be heard by every other node. This could 
correspond to a collocated subnet in a sensor network. Each 
node has a Boolean variable and we focus on the specific 
problem of symmetric Boolean function computation. We 
adopt a deterministic formulation of the problem of function 
computation, allowing zero error. We consider the problem of 
worst-case function computation, without imposing a proba- 
bility distribution on the node measurements. Further, instead 
of restricting a strategy to compute just one instance of the 
problem, we allow for a block of N independent instances 
for which the function is to be computed. Thus nodes can 
accumulate a block of N measurements, and realize greater 
efficiency by using block codes. 

We assume a packet capture model as in (TJ, where colli- 
sions do not convey information. Thus, the problem of medium 
access is resolved by allowing at most one node to transmit 
successfully at any time. The set of admissible strategies 
includes all interactive strategies, where a node may exchange 
several messages with other nodes. It is of particular interest to 
study the benefit of interactive strategies versus single-round 
strategies, where each node transmits only one message. 

We begin with the problem of computing the Boolean AND 
function of two variables in Section [Til] This problem was 
studied in J2j, where it was shown that the exact communi- 
cation complexity is log 2 3 bits, for block computation. The 
lower bound was established using fooling sets, and a novel 
achievable scheme was presented which minimizes the worst 
case total number of bits exchanged. The proof technique 
outlined above can be extended to the problem of computing 
the AND function of n Boolean variables. 

In Section [IV] we consider threshold functions, which 
evaluate to 1 if and only if the total number of Is are above 
a certain threshold. For this class of functions, we devise 
an achievable strategy which involves each node transmitting 
in turn using a prefix-free codebook. Further, by intelligent 
construction of a fooling set, we obtain the exact complexity 
of computing threshold functions. It is interesting to note that 
the optimal strategy requires no back-and-forth interaction 
between nodes. In Section IIV-A1 we also obtain the exact 
complexity of computing delta functions which evaluate to 
1 if and only if there are a certain number of Is. 

In Section|VJ we study the complexity of computing interval 



functions, which evaluate to 1 if and only if the total number 
of Is belong to a given interval [a,b]. For a fixed interval 
[a,b], the proposed strategy for achievability is order-optimal 
with optimal preconstant. Additionally, for the interesting class 
of percentile functions, the proposed single-round strategy is 
order optimal. The results can be easily extended to the case 
of many intervals and further, to the case of non-Boolean 
alphabets. 

II. RELATED WORK 

The problem of worst-case block function computation was 
formulated in [1|. The authors identify two classes of symmet- 
ric functions namely type-sensitive functions exemplified by 
Mean and Median, and type-threshold functions, exemplified 
by Maximum and Minimum. The maximum rates for compu- 
tation of type-sensitive and type-threshold functions in random 
planar networks are shown to be ©(j^) and ' ) 

respectively, for a network of « nodes. A communication 
complexity approach was used to establish upper bounds on 
the rate of computation in collocated networks. 

In communication complexity Q, one seeks to minimize 
the number of bits that must be exchanged between two nodes 
to achieve worst-case zero-error computation of a function 
of the node variables. The communication complexity of 
Boolean functions has been studied in (4), 0. Further, one can 
consider the direct-sum problem J6) where several instances of 
the problem are considered together to obtain savings. This 
block computation approach is used to compute the exact 
complexity of the Boolean AND function in 0. In this paper, 
we considerably generalize this result, which allows us to 
derive optimal strategies for computing more general classes 
of symmetric Boolean functions in collocated networks. 

While we have considered worst case computation in this 
paper, one could also impose a probability distribution on the 
measurements. In 0, the average complexity of computing a 
type-threshold function was shown to be 0(1), in contrast with 
the worst case complexity of ©(logn). Thus, we can obtain 
constant rate computation on the average. 

As argued in 0, an information-theoretic formulation of 
this problem combines the complexity of source coding with 
rate distortion as well as the manifold collaborative possibili- 
ties in wireless, together with the complications introduced by 
the function structure. There is little or no work that addresses 
this most general framework. One special case, a source coding 
problem for function computation with side information, has 
been studied in |8 |. Recently, the rate region for multi-round 
interactive function computation has been characterized for 
two nodes (9j, and for collocated networks iflOl . 

III. Zero-error block computation of the AND 

FUNCTION 

A. General problem setting 

Consider a collocated network with nodes 1 through n, 
where each node i has a Boolean measurement Xi g {0, 1 }. Ev- 
ery node wants to compute the same function f(X\ ,X 2 , . . . ,X n ) 
of the measurements. We seek to find communication schemes 



which achieve correct function computation at each node, 
with minimum worst-case total number of bits exchanged. We 
allow for the efficiencies of block computation, where each 
node i has a block of N independent measurements, denoted 
by Xf . Throughout this paper, we consider the broadcast 
scenario, where each node's transmission can be heard by 
every other node. We also suppose that collisions do not 
convey information thus restricting ourselves to collision-free 
strategies as in JTJ. This means that for the kf h bit bk, the 
identity of the transmitting node 7^ depends only on previously 
broadcast bits bi,b 2 , • • • >bk-u while the value of the bit it 
sends can depend arbitrarily on all previous broadcast bits as 

well as its block of measurements XS . 

h 

It is important to note that all interactive strategies are sub- 
sumed within the class of collision-free strategies. A collision- 
free strategy is said to achieve correct block computation if 
each node i can correctly determine the value of the function 
block f N (X\,X2, ...,X n ) using the sequence of bits b\,b 2) ... 
and its own measurement block Xf . Let be the class of 
collision-free strategies for block length N which achieve zero- 
error block computation, and let C(/,Sn,N) be the worst-case 
total number of bits exchanged under strategy Sn S J^v. The 
worst-case per-instance complexity of computing a function 
f(X u X 2 ,...,X n ) is defined by 

C(f,S N ,N) 



C(f) 



lim min 



N 



We call this the broadcast computation complexity of the 
function /. 

B. Complexity of computing Xy AX 2 

Before we can address the general problem of computing 
symmetric Boolean functions, we consider the specific prob- 
lem of computing the AND function, which is 1 if all its 
arguments are 1, and otherwise. We start by considering 
just two nodes, namely 1 and 2, with measurement blocks X^ 
and X^ and we seek to compute the element-wise AND of the 
two blocks, denoted by A "(Xi,X 2 ). This problem was studied 
in and we briefly review the proof. 

Theorem 1: Given any strategy Sn for block computation 
of X x AX 2 , 

C(Xi AX 2 ,S N ,N)>Nlog 2 3. 

Further, there exists a strategy S N which satisfies 

C(X l AX 2 ,S* N ,N) < [iVlog 2 3l. 

Thus, the complexity of computing X\ l\X 2 is given by C(X\ A 
X 2 ) = log 2 3. 

Proof of achievability: Suppose node 1 transmits first us- 
ing a prefix-free codebook. Let the length of the codeword 
transmitted be l(X^ ). At the end of this transmission, both 
nodes know the value of the function at the instances where 
X\ = 0. Thus node 2 only needs to indicate its bits for the 
instances of the block where X\ = 1. Thus the total number 
of bits exchanged under this scheme is l(X^) + w(X^), where 
w(X^) is the number of Is in X^. For a given scheme, let us 



define 



L:=max(Z(xf)+w«)), 



to be the worst case total number of bits exchanged. We are 
interested in finding the codebook which will result in the 
minimum worst-case number of bits. 

Any prefix-free code must satisfy the Kraft inequality 
given by ^2~ /( *i ' < 1. Consider a codebook with l(X[ 



pVlog 2 3] — w(xj). This satisfies the Kraft inequality since 
Y, X N W {X\) — 3 W . Hence there exists a valid prefix free 
code for which the worst case number of bits exchanged is 
pVlog 2 3], which establishes that C(X X AX 2 ) < log 2 3. 

The lower bound is shown by constructing a fooling set J5] 
of the appropriate size. We digress briefly to introduce the 
concept of fooling sets in the context of two-party communi- 
cation complexity Q. Consider two nodes X and Y, each of 
which take values in finite sets X and W , and both nodes 
want to compute some function f(X,Y) with zero error. 

Definition 1 (Fooling Set): A set E C 3C X W is said to be 
a fooling set, if for any two distinct elements (xi,y\), (x 2 ,y 2 ) 
in E, we have either 

• f(xi,yi) ^f{x2,yi), or 

• /(*i,3'i) = f(x2,y2), but either f(x u y 2 ) ^ f(xi,yi) or 

f(x2,yi)^f{xi,yi). 

Given a fooling set E for a function f(X\,X 2 ), we have 
C(f(X\,X2)) > log 2 |£'|. We have described two dimensional 
fooling sets above. The extension to multi-dimensional fooling 
sets is straightforward and gives a lower bound on the com- 
munication complexity of the function f{X\ ,X 2 , . . . ,X„). 
Lower bound for Theorem [TJ We define the measurement 
matrix M to be the matrix obtained by stacking the row 
X^ over the row X^ ■ Thus we need to find a subset of 
the set of all measurement matrices which forms a fooling 
set. Let E the set of all measurement matrices which are 
made up of only the column vectors { q ' 1 ' 1 ^' 
We claim that E is the appropriate fooling set. Consider two 
distinct measurement matrices M\,M 2 6 E. Let f N (M\) and 
f N (M 2 ) be the block function values obtained from these 
two matrices. If f N (M\) ^ f N (M 2 ), we are done. Let us 
suppose f N (M\) =f N (M 2 ) and since M\ ^ M 2 , there must 



exist one column where M\ has 



but Mi has 



Now if we replace the first row of M\ with the first row of 
M 2 , the resulting measurement matrix, say M* is such that 
f(M*) ^ /(Mi). Thus, the set £ is a valid fooling set. It is 
easy to verify that the E has cardinality 3 N . Thus, for any 
strategy S N <E y N , we must have C(X 1 AX 2 ,S N ,N) > AHog 2 3, 
implying that C{X\ AX 2 ) > log 2 3. This concludes the proof of 
Theorem Q] □ 

Corollary 1: The complexity of the OR function is given 
by C{X\ \/X 2 ) = log 2 (3), since we can view it as X\ /\X 2 , by 
deMorgan's laws. 

The above approach can be easily extended to the 
general AND function of n variables, and we obtain 



C(A(X\,X 2 , . . . ,X„)) = log 2 (n+ 1). We now proceed to provide 
an exact result for a more general class of functions, called 
threshold functions, which includes AND as a special case. 
Note: Throughout the rest of the paper, for ease of exposition, 
we will ignore the fact that terms like iVlog 2 (rt + 1) may not 
be integer. Since our achievability strategy involves each node 
transmitting exactly once, this will result in a maximum of 
one extra bit per node, and since we are amortizing this over 
a long block length N, it will not affect any of the results. 

IV. Complexity of computing Boolean threshold 
functions 

Definition 2 (Boolean threshold functions): A Boolean 
threshold function Yle(Xi,X 2 , . . . ,X n ) is defined as 



TIe(X l ,X 2 , . . . ,X n ) = 



i if LiXi>e 

otherwise. 



Theorem 2: The complexity of computing a Boolean 

threshold function is C{Yl g (X u X 2 , . . .X,,)) = log 2 ( " ^ 1 

Proof of Achievability: The upper bound is established by 
induction on n. From Theorem [T] and Corollary [U the result 
is true for n = 2 and for all 1 < 9 < n, which is the basis 
step. Suppose the upper bound is true for a collocated network 
of (n— 1) nodes, for all 1 < 9 < (n — 1). Given a function 
Tie ,X 2 , . . . ,X„) of n variables, consider an achievable strat- 
egy in which node « transmits first, using a prefix free code- 
word of length l(X„). After this transmission, nodes 1 through 
n— 1 can decode the block X^ . For the instances where X„ = 0, 
these (n — 1) nodes now need to compute Ylg(Xi ,X 2 ,... ,Z„_i). 
For the instances where X n = 1, the remaining (n — 1) nodes 
need to compute T1q_\(Xi,X 2 , . . . ,X n _i). From the induction 
hypothesis, we have optimal strategies for computing these 
functions. Let w'(X^) denote the number of instances of i in 
the block X„ . Under the above strategy, the worst-case total 
number of bits exchanged is 



L = max(/(X,f)+w°(X,f)log 2 



+ w 1 (X,f)log 2 



0-1 



We want to minimise this quantity subject to the Kraft 
inequality. Consider a prefix-free codebook which satisfies 



ZPC)=Mog 2 ( "+ 1 )-H.°(X„")log 2 



-w'(X,f)log 



1-1 



This assignment of codelengths satisfies the Kraft inequality 
since 



n + l 



n + \ 
9 



-N 



L 



,v»(X,f) 



n 

9-1 











[( 






)] 



Hence there exists a prefix-free code which satisfies the 

specified codelengths, and we have L = N\og 2 ( n ~Q^ 

which proves the induction step. 
Proof of lower bound: We need to find a subset of the set of 



all nxN measurement matrices which is a valid fooling set. 
Consider the subset E of measurement matrices which consist 

1) or 0. Since there are N 

N 

such matrices. 



of only columns which sum to (0 
columns, there are 



n \ I n 

We claim that the set £ is a valid fooling set. Let M\, M 2 
be two distinct matrices in this subset. If f N (M\) ^f N (M 2 ), 
then we are done. Suppose not. Then there must exist at least 
one column at which M\ and M 2 disagree, say Mj ^ M 2 . 
However, both M^" 1 and M 2 have the same number of ones. 
Thus there must exist some row, say /*, where Mj has a zero, 
but M 2 has a one. 



also know if J^i^i < a ' which is superfluous information and 
perhaps costly to obtain. Alternately, we can derive a strategy 
which explicitly deals with intervals, as against thresholds. 
This strategy has significantly lower complexity. 

Theorem 4: The complexity of computing a Boolean inter- 
val function II [ a _^ (X\ ,X 2 , ■ ■ ■ ,X n ) with a + b <n is bounded 
as follows: 



log 2 



n+1 
b+l 

<log 2 



11 

a- I 



<C(Yl [aM (X u X 2: ...X„)) 



T+\ )Hb-a + l)[ a l t 



(1) 



(i) Suppose /(A/P) = f(M^ > ) = 0. Then, consider the matrix 
M* obtained by replacing the ;'*th row of Mi with the i*th 
row of M 2 . The j ,h column of M{ has 9 ones, and hence 
f(M\ (i) ) = 1. Hence we have f(M\) + 1 /(Mi). 



The complexity of computing a Boolean interval function 
II[ a fe j (X\ ,...,X n ) with a + b > n is bounded as follows: 



lo g2 



(ii) Suppose f(M[ j) ) 



-f(M 2 



U) 



) = 1 . Then, consider the matrix 



M 2 obtained by replacing the i*th row of M 2 with the /*th 
row of Mi. The f h column of M\ has 9 - 1 ones, and hence 
f{M* 2 U) ) = 1. Hence we have /(M|) + 1 f{M 2 ). 

Thus, the set £ is a valid fooling set. From the fool- 
ing set lower bound, for any strategy Sn £ ^n, we must 

n+1 



n+1 
a 

<log 2 



n 

b+l 



<c(n [aM (x h x 2 ,...x n )) 



n+1 
a 



>-a + l) 



n 

b+l 



(2) 



have C(n e (X h X 2 ,...,X n ),S N ,N)>Nlog 2 
ing that C(U e (X u X 2 , . . . ,X n )) > log 2 



9 



imply- 



n + 1 




A. Complexity of Boolean delta functions 

Definition 3 (Boolean delta function): A Boolean delta 
function Il^{X\ ,X 2 , . . . ,X n ) is defined as: 



Proof of lower bound: Suppose a + b < n. Consider the 
subset E of measurement matrices which consist of only 
columns which sum to (a — 1), b or (b + l). We claim that 
the set £ is a valid fooling set. Let M\, M 2 be two distinct 
matrices in this subset. If f N (M\) ^ f N (M 2 ), we are done. 
Suppose not. Then there must exist at least one column at 
which Mi and M 2 disagree, say Mf' ^ Mip . 
(i) Suppose f(M[ j) ) = f(M { 2 3) ) = 1. Then, both m[ J) and 
Mj have exactly b Is. Thus there exists some row, say i*, 



where m[^ has a 0, but M^' has a 1. Consider the matrix 
Mj* obtained by replacing the /*th row of M\ with the ;*th 
row of M 2 . The column of M* has (b+l) Is, and hence 
f(M\ (i) ) =0, which means f(M* l )^f(M l ). 
computing (H) Suppose /(Mp)) = f(M ( 2 j) ) = 0. If both m[ J) and M [ 2 j) 
have the same number of Is, then the same argument as in (i) 



n{e}pfi,X2, . . . ,x n ) 



1 if EM = 9 

otherwise. 



Theorem 3: The complexity 
Yl w (X u X 2 , . . . ,X n ) is given by 



of 



c(n w (Xi,x 2 ,...,x I1 )) = iog 2 



n+ 1 
9 



11 

9 + 1 



applies. However, if has (a - 1) Is and M) J ' has 1) 
Is, then there exists some row i* where M± has a 0, but M 2 ^ 
has a 1. Then, the matrix Mi, obtained by replacing the /*th 
row of M 2 with the f th row of M\ is such that /(Mf) ^ f(M 2 ). 



(./) 



Sketch of Proof: The proof of achievability follows from an 
inductive argument as before. The fooling set E consists of 
measurement matrices composed of only columns which sum 
up to 9 — 1, 9 or 9 + 1. Thus the size of the fooling set is 



/? 

0-1 



/? 

+ 1 



.□ 



V. Complexity of computing Boolean interval 
functions 

A Boolean interval function f\ajb] C^i 1 • • ■ A) i s defined as: 

n M (x u x 2 , . . . ,x n ) = I J ^ f th fl er - w f s f " b 

A naive strategy to compute the function Yl^ a b ^(X\ 1 . . . ,X n ) 
is to compute the threshold functions n„(Xi, . . . ,X„) and 
H-b + i(Xi 7 X 27 . . . ,X n ). However, this strategy gives us more 
information than we seek, i.e., if G [a,£>] c , then we 




valid fooling set and \E\ = 
\~\ N 

. This gives us the re- 



Thus, the set E 
n \ / n 
b + l ) + \ a-l 
quired lower bound in ([T 

For the case where a + b >n, we consider the fooling set 
E' of matrices which are comprised of only columns which 
sum to a — 1, a or b+l. This gives us the lower bound in (fJJ. 
Proof of achievability: Consider the general strategy for 
achievability where node « transmits a prefix-free codeword 
of length l(X^ ), leaving the remaining (n — 1) nodes the 
task of computing a residual function. This approach yields a 
recursion for computing the complexity of interval functions. 

c(n M (Xi , . . . ,x n )) < logj^^-'^nW'-^"- 1 )) ^CW^.-A-i)] 



The boundary conditions for this recursion are obtained from 
the result for Boolean threshold functions in Theorem [2] We 
could simply solve this recursion computationally, but we want 
to study the behaviour of the complexity as we vary a, b 
and n. Define / aA „ := 2 c ( r V*]( x > •■■■■*»». We have the following 
recursion for f(a,b,n) 



f(a,b,n) <f(a-l,b-l,n-l)+f(a,b,n-l) 



(3) 



We proceed by induction on n. From Theorems [2] and [3] the 
upper bounds in ([TJ and (O are true for « = 2 and all intervals 
[a,b]. Suppose the upper bound is true for all intervals [a,b] 
for (n — 1) nodes. Consider the following cases, 
(i) Suppose a + b <n — l. Substituting the induction hypothesis 
in (01, we get 



f{a,b,n) 



< 



■(b-a+l) 



n-l 

a — 2 



n 

b+l 

+ 1 ' 
+ 1 



1) 



n-l 
a — I 



1) 



n 

a-l 



(ii) Suppose a + b > n + 1. Proof is similar to case (i). 

(iii) Suppose a + b = n. Substituting the induction hypothesis 
in ©, we get 

f(a,b,n) < ( I )+ib-o + l)(lZ l 2 



< 



n+l 



+ (b-a + l) 



b + l 



where some steps have been omitted in the proof of the last 
inequality. This establishes the induction step and completes 
the proof. □ 

A. Discussion of Theorem [4] 

(a) The gap between the lower and upper bounds in ([]} and 
d2J is additive, and is upper bounded by \og 2 (b — a + 2) which 
is log 2 (n + 2) in the worst case. 

(b) For fixed a and b, as the number of nodes increases, we 
have a + b <n for large enough n. Consider the residual term, 

(b-a + l) ( n J on the RHS in ©. We have 



1) 



n 

a-l 



Hence, C(U [a>b] (X u . . . ,X n )) = log 2 M £+j J(l+o(l)) 

Thus, for any fixed interval [a,b], we have derived an order 
optimal strategy with optimal preconstant. The orderwise 
complexity of this strategy is the same as that of the threshold 
function Yli,+ i(Xi, . . . ,X„). Similarly, we can derive order 
optimal strategies for computing C(Il[„_ fl , 7 _^ (X\ ,...,X„)) and 
C{Jl[ a . n -b](X\T . . ,X n )), for fixed a and b. 
(c) Consider a percentile type function where [a,b] = [an, fin], 



with (a + j3) < 1. Using Stirling's approximation, we can still 
show that 

n + l 
fin + l 



Thus we have derived an order optimal strategy with optimal 
preconstant for percentile functions. 

(d) Consider the function / := ITu.^^.] (^i , ■ ■ ■ ,X n ) where the 
intervals [«,,£>,] are disjoint, and may be fixed or percentile 
type. We can piece together the result for single intervals and 
show that 

c(/(Zi,...,z„)) = iog 2 (x>( fl! A,»)(i+o(i))J • 



where g(ai,bi,n) = 



n+l 
bf + l 
n+l 



if a, + b{ <n 
if a\ + bi > n. 



VI. Concluding remarks 

We have addressed the problem of computing symmetric 
Boolean functions in a collocated network. We have derived 
optimal strategies for computing threshold functions and order 
optimal strategies with optimal preconstant for interval func- 
tions. Thus, we have sharply characterized the complexity of 
various classes of symmetric Boolean functions. Further, since 
the thresholds and intervals are allowed to depend on n, we 
have provided a unified treatment of type-sensitive and type- 
threshold functions. 

The results can be extended in two directions. First, we can 
consider non-Boolean alphabets and functions which depend 
only on Alternately, we can consider non-Boolean 

functions of a Boolean alphabet. The fooling set lower bound 
and the strategy for achievability can be generalized to both 
these cases. 
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